Improving Statistical Machine Translation using Lexicalized Rule Selection
نویسندگان
چکیده
This paper proposes a novel lexicalized approach for rule selection for syntax-based statistical machine translation (SMT). We build maximum entropy (MaxEnt) models which combine rich context information for selecting translation rules during decoding. We successfully integrate the MaxEnt-based rule selection models into the state-of-the-art syntax-based SMT model. Experiments show that our lexicalized approach for rule selection achieves statistically significant improvements over the state-of-the-art SMT system.
منابع مشابه
A Lexicalized Reordering Model for Hierarchical Phrase-based Translation
Lexicalized reordering model plays a central role in phrase-based statistical machine translation systems. The reordering model specifies the orientation for each phrase and calculates its probability conditioned on the phrase. In this paper, we describe the necessity and the challenge of introducing such a reordering model for hierarchical phrase-based translation. To deal with the challenge, ...
متن کاملUse of Rich Linguistic Information to Translate Prepositions and Grammatical Cases to Basque
This paper presents three successful techniques to translate prepositions heading verbal complements by means of rich linguistic information, in the context of a rule-based Machine Translation system for an agglutinative language with scarce resources. This information comes in the form of lexicalized syntactic dependency triples, verb subcategorization and manually coded selection rules based ...
متن کاملGeneralized Reordering Rules for Improved SMT
We present a simple yet effective approach to syntactic reordering for Statistical Machine Translation (SMT). Instead of solely relying on the top-1 best-matching rule for source sentence preordering, we generalize fully lexicalized rules into partially lexicalized and unlexicalized rules to broaden the rule coverage. Furthermore, , we consider multiple permutations of all the matching rules, a...
متن کاملLeft-to-Right Hierarchical Phrase-based Machine Translation
Hierarchical phrase-based translation (Hiero for short) models statistical machine translation (SMT) using a lexicalized synchronous context-free grammar (SCFG) extracted from word aligned bitexts. The standard decoding algorithm for Hiero uses a CKY-style dynamic programming algorithm with time complexity O(n3) for source input with n words. Scoring target language strings using a language mod...
متن کاملBayesian Reordering Model with Feature Selection
In phrase-based statistical machine translation systems, variation in grammatical structures between source and target languages can cause large movements of phrases. Modeling such movements is crucial in achieving translations of long sentences that appear natural in the target language. We explore generative learning approach to phrase reordering in Arabic to English. Formulating the reorderi...
متن کامل